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Abstract 

In this paper the stability theorem of Borkar and Meyn is extended 
to include the case when the mean field is a set-valued map. Two differ¬ 
ent sets of sufficient conditions are presented that guarantee the ‘stability 
and convergence’ of stochastic recursive inclusions. Our work builds on 
the works of Benaim, Hofbauer and Sorin as well as Borkar and 
Meyn. As a corollary to one of the main theorems, a natural generaliza¬ 
tion of the Borkar and Meyn Theorem follows. In addition, the original 
theorem of Borkar and Meyn is shown to hold under slightly relaxed as¬ 
sumptions. As an application to one of the main theorems we discuss 
a solution to the ‘approximate drift problem’. Finally, we analyze the 
stochastic gradient algorithm with “constant error gradient estimators” 
as yet another application of our main result. 


1 Introduction 

Consider the following recursion in (d > 1): 

Xn+I = Xn + a{n) [h{xn) + M„+i], for n > 0, where (1) 

(i) h : ^ is a Lipschitz continuous function. 

(ii) a{n) > 0, for all n, is the step-size sequence satisfying a{n) = oo 

and < oo. 

(iii) Mm n > 1, is a sequence of martingale difference terms that constitute 
the noise. 

The stochastic recursion given by 0 is often referred to as a stochastic 
recursive equation (SRE). A powerful method to analyze the limiting behavior 
of (HI) is the ODE (Ordinary Differential Equation) method. Here the limiting 
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behavior of the algorithm is described in terms of the asymptotics of the solution 
to the ODE 

x(t) = h{x{t)). 

This method was introduced by Ljung [12] in 1977. For a detailed exposition 
on the subject and a survey of results, the reader is referred to Kushner and 
Yin [TT] as well as Borkar [10] . 

In 1996, Benaim |4] showed that the asymptotic behavior of a stochastic 
recursive equation can be studied by analyzing the asymptotic behavior of the 
associated o.d.e. However no assumptions were made on the dynamics of the 
o.d.e. Specifically, he developed sufficient conditions which guarantee that limit 
sets of the continuously interpolated stochastic iterates are compact, connected, 
internally chain transitive and invariant sets of the associated o. d. e. The results 
found in |4] are generalized in (3; further studies were made by Benaim and 
Hirsch in [Bj. The assumptions made in [1] are sometimes referred to as the 
‘classical assumptions’. One of the key assumptions used by Benaim to prove 
this convergence theorem is the almost sure boundedness of the iterates i.e., 
stability of the iterates. In 1999, Borkar and Meyn [T3] developed sufficient 
conditions which guarantee both the stability and convergence of stochastic 
recursive equations. These assumptions were consistent with those developed in 
[3]. In this paper we refer to the main result of Borkar and Meyn colloquially 
as the Borkar-Meyn Theorem. In the same paper m, several applications to 
problems from reinforcement learning have also been discussed. Another set of 
sufficient conditions for SRE^s were developed by Andrieu, Moulines and 
Priouret [1] using global Lyapunov functions that guarantee the stability and 
convergence of the iterates. 

In 2005, Benaim, Hofbauer and Sorin [7] showed that the dynamical 
systems approach can be extended to the situation where the mean fields are 
set-valued. The algorithms considered were of the form: 

Xn+I = Xn + a{n) [yn + Mn+i] , for n > 0, where (2) 

(i) Vn S h{xn) and : R'’* —>■ {subsets of is a Marchaud map. For the 
definition of Marchaud maps the reader is referred to section 12.11 

(ii) a(n) > 0, for all n > 0, is the step-size sequence satisfying ~ 

and X^o < oo. 

(iii) Mn, n > 1, is a sequence of martingale difference terms. 

A recursion such as m is also called stochastic recursive inclusion (SRI). Since 
a differential equation can be seen as a special case of a differential inclusion 
wherein h{x) is a cardinality one set for all x G R'^, SRE (|T|) can be seen as a 
special case of SRI ([2|). 

The main aim of this paper is to extend the original Borkar-Meyn theorem 
to the case of stochastic recursive inclusions. We present two overlapping yet 
different sets of assumptions, in Sections and 1531 respectively, that guarantee 
the stability and convergence of a SRI given by ®. As a consequence of our 
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main results, Theorems [H and [31 we present a couple of interesting extensions 
to the original theorem of Borkar and Meyn in Section 01 Using the frameworks 
presented herein we provide a solution to the problem of approximate drift in 
Section o For more details on the approximate drift problem the reader is 
referred to Borkar [10]. In Section|6]we discuss the generality, ease of verifiability 
and we also try to explain why the assumptions are “natural” in some sense. 

Stochastic gradient descent (SGD) is an important method to hnd minima of 
(continuously) differentiable functions. When implementing the corresponding 
approximation algorithm (See (1131) in Section [3?3]) using gradient estimators, an 
error is made at each step in calculating the gradient of the objective function. 
Lets call this error the “approximation error”. This is the case when using gra¬ 
dient estimators such as Kiefer-Wolfowitz, simultaneous perturbation stochastic 
approximation (SPSA) and smoothed functional (SF) schemes, see Suppose 
the perturbation parameters of the aforementioned estimators are kept constant, 
then the “approximation error” is bounded by a constant that depends on the 
size of the perturbation parameters. We call such estimators constant-error 
gradient estimators. In Section 15.21 we analyze the stochastic gradient approx¬ 
imation algorithm that uses a constant-error gradient estimator. Using Theo¬ 
rem [3] we show that the iterates are stable and converge to a d-neighborhood 
of the minimum set, for a specified i5(> 0). Essentially, our framework gives 
a threshold e(d) for the “approximation error” so that the stochastic gradient 
approximation algorithm is stable and converges to a d-neighborhood of the 
minimum set. 


It is worth noting that prior to this paper one could only claim that an SGD 
using constant-error gradient estimators will only converge to some neighborhood 
of the minimum set with high probability. On the other hand, our framework 
guarantees almost sure convergence to a small neighborhood of the minimum 
set. 

2 Preliminaries and Assumptions 

2.1 Definitions and Notations 

The dehnitions and notations used in this paper are similar to those in Benai'm 
et. al. [7], Aubin et. al. 0, 0 and Borkar m- In this section, we present a 
few for easy reference. 

A set-valued map h : K" —>■ {subsets of M"* } is called a Marchaud map if it 
satishes the following properties: 

(i) For each x G R”, h{x) is convex and compact. 

(ii) (point-wise boundedness) For each x G K”, sup ||w|| < AT (1 -|- ||a;||) for 

w^h{x) 

some K > 0. 

(iii) h is an upper-semicontinuous map. We say that h is upper-semicontinuous, 

if given sequences {xn}n>i (in R”) and {yn}n>i (in R"*) with x, 
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Un ^ y and yn G h{xn), n > 1, implies that y € h{x). In other words the 
graph of h, {(x, y) ■ y & h{x), x € R"}, is closed in R" x R™. 

Let iJ be a Marchaud map on R'^. The differential inclusion (DI) given by 

ck G H{x) (3) 

is guaranteed to have at least one solution that is absolutely continuous. The 
reader is referred to [2] for more details. We say that x S ^ if x is an absolutely 
continuous map that satisfies (ED- The set-valued semiflow $ associated with 
m is defined on [0, +oo) x R'^ as: 

$t(x) = {x(t) I X e = 2 :}. Let B X M C [0, + 00 ) x R^ and define 

$b(M) = y $t(x). 
tes, xgm 

Let M C R'^, the w — limit set be defined by w^{M) = nt>o ‘i’[t,+oo)(-^)- 
Similarly the limit set of a solution x is given by L(x) = Hoo + 00 )). 

M C R'^ is invariant if for every x G M there exists a trajectory, x, entirely 
in M with x(0) = x. In other words, x S ^ with x(t) G M, for all t > 0. 

Let X G R'^ and A C R'^, then d{x,A) := inf{||a — y\\ \ y G A}. We define 
the 5-open neighborhood of A by N^{A) := {x \ d{x,A) < 5}. The 5-closed 
neighborhood of A is defined by N^{A) := {x | d{x,A) < 5}. The open ball 
of radius r around the origin is represented by i?r(0), while the closed ball is 
represented by Brif)). 

Internally Chain Transitive Set: M C R'^ is said to be internally chain 
transitive if M is compact and for every x,y G M, e > 0 and T > 0 we have the 
following: There exist ..., that are n solutions to the differential inclusion 
x(t) G h(x{t)), a sequence xi(= x),..., Xn+i(= y) C M and n real numbers 
ti,t 2 , ■ ■ ■ ,tn greater than T such that: (x^) G N'^{xi+i) and ^ ^ 

for I < i < n. The sequence (xi(= x),... ,x„+i(= y)) is called an (e,T) chain 
in M from x to y. 

A C R'^ is an attracting set if it is compact and there exists a neighborhood 
U such that for any e > 0, 3 T{e) > 0 with C N'^{A). Such a 

U is called the fundamental neighborhood of A. In addition to being compact 
if the attracting set is also invariant then it is called an attractor. The basin of 
attraction of A is given by B{A) = {x | w$(x) C A}. It is called Lyapunov 
stable if for all ^ > 0, 3 e > 0 such that $[o_+oo)(-^'^(^)) C N^{A). We use T(e) 
and Te interchangeably to denote the dependence of T on e. 

We define the lower and upper limits of sequences of sets. Let {Kn}n>i be 
a sequence of sets in R'^. 

1. The lower limit of {Kn}n>i is given by, Liminfn^ooKn ■= {x \ lim d{x, Kn) = 

n—¥oo 

0}. 

2. The upper-limit of {Kn}n>i is given by, Limsupn^ooKn ■= {y \ Hm d{y, Kn) = 

n—¥oo 

0}. 

We may interpret that the lower-limit collects the limit points of {Kn}n>i 
while the upper-limit collects its accumulation points. 
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2.2 The assumptions 

Recall that we have the following recursion in 

Xn+I = Xn + a{n) [yn + M„+i], where yn & h{xn). 

We state our assumptions below: 

(Al) h : —>■ {subsets of is a Marchaud map. 

(A2) {«( ^)}n>o is a scalar sequence such that: a(n) > 0 Vn, ^ a(n) = oo and 

n>0 

a{n)^ < oo. Without loss of generality we let sup a{n) < f. 

n >0 " 

(A3) {Mn}n>i is a martingale difference sequence with respect to the fil¬ 
tration 

:= a {xo, Mi,..., M„), n > 0. 

(i) {M„}n,>i is a square integrable sequence. 

(ii) i?[||M„+i|p|J^„] < A (l -I- ||x„|p), for n > 0 and some constant K > 
0. Without loss of generality assume that the same constant, K, 
works for both the point-wise boundedness condition of (Al) (see 
condition (ii) in the definition of Marchaud map in Section [HI) and 
(A3). 


For c > 1 and x G define hc{x) = {y \ cy G h{cx)}. Further, for each 
X G define hoo{x) := Liminfc->.oo hc{x) i.e. the closure of the lower-limit 
of {hc.{x)}c>i. 


(A4) h^{ x) is non-empty for all x G Further, the differential inclusion 
x{t) G hoa{x{t)) has an attracting set, A, with i?i(0) as a subset of its 
fundamental neighborhood. This attracting set is such that A C i?i(0). 

(A5) Let c„ > 1 be an increasing sequence of integers such that c„ t oo as 
n — >■ oo. Further, let — >• x and yn —>■ y as n —>■ oo, such that 

Vn G /ic„(x„), Vn, then y G hoo(x). 


Since the attracting set, A C i?i(0), is compact we conclude that sup||x|| < 1. 

x^A 

To see this, for all x G A define S{x) := sup ||y||, where e(x) > 0 and 

Be{x){x) Q For all x G A we have S{x) < 1. Further, {B^(^j:'j{x) | x G A} 

is an open cover of A. Let {B^(^x.'j{xi) \ 1 < f < n} be a finite sub-cover and 

<5 := max i5(xi). Clearly, it follows that sup||x|| < <5 < 1. Define di := sup||x|| 

l<2<n x^A. x^A 

and pick real numbers S 2 , S 3 and 64 such that sup||x|| = di < ^2 < 1 J 3 < ^4 < 1 . 

xeA 


We shall use this sequence later on. 
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Assumptions (Al) — (A3) are the same as in Benai'm [7]. However, the as¬ 
sumption on the stability of the iterates is replaced by (A4) and (AS). We show 
that (A4) and (AS) are sufficient conditions to ensure stability of iterates. We 
start by observing that he and hoo are Marchaud maps, where c > 1. Further, 
we show that the constant associated with the point-wise boundedness property 
is K of (Al) and (A3). 

Proposition 1. hoa and he, c > 1, are Marchaud maps. 

Proof. Fix c > 1 and x G M‘^. To prove that hc(x) is compact, we show that it 
is closed and bounded. For n > 1, let G hc(x) and let lim y„ = y. It follows 


n—>cxD 


that epn G h{cx) for each n > 1 and lim cy„ = cy. Since h(cx) is closed, we 


have that cy G h{cx) and y G hc{x). If we show that he is point-wise bounded 
then we can conclude that he{x) is compact. To prove the aforementioned, let 
y G he(x), then cy G h(cx). Since h satisfies (A1 )(m), we have that 


c||y|| < K (1+ ||cx||), hence 



Since c(> 1) and x is arbitrarily chosen, he is point-wise bounded and the 
compactness of hc{x) follows. The set hdx) = {zjc \ z G h(cx)} is convex since 
h{cx) is convex and he{x) is obtained by scaling it by A 

Next, we show that he{x) is upper-semicontinuous. Let lim Xn = x, lim 

n—^oo n—^oo 

Un = y and y„ G he{xn), V n > I. We need to show that y G hc(x). We have 
that cyn G h{cxn) for each n > 1. Since lim cxn = cx and lim c?/„ = cy, we 

n—¥oo n—¥oc 

conclude that cy G h{cx) since h is assumed to be upper-semicontinuous. 

It is left to show that heo{x), a; G is a Marchaud map. To prove that ||z|| 
< K {1 + ||a;||) for all z G haoix), it is enough to prove that ||y|| < K {\+ ||a;||) 
for all y G Liminfe^oo he{x). Fix some y G Liminfe^oo he{x) then there exist 
Zn G hn{x), n > 1, such that lim ||y — z„|| = 0. We have that 


||2 /||<||y-^n|| + ||^„||. 


Since he, c > 1, is point-wise bounded (the constant associated is independent 
of c and equals K), the above inequality becomes 


\\y\\<\\y-Zu\\ + K{l + \\x\\). 


Letting n —>■ oo in the above inequality, we obtain ||y|| < Ar(l-|-||a;||). Recall 
that hao{x) = Liminfe^oo he{x), hence it is compact. 

Again, to show that hoc{x) is convex, for each x G we start by proving 
that Liminfe^oo he{x) is convex. Let u,v G Liminfe^oo he{x) and 0 < t < 1. 
We need to show that tu+ (I —t)v G Liminfe^oo he{x). Consider an arbitrary 
sequence {c„}„>i such that c„ ^ oo, then there exist Un,Vn G hc^(x) such that 
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\\un — u|| and ||z;„ — v|| —>-0 as c„ cx). Since hc„{x) is convex, it follows that 
tUn + (1 — t)v„ € hc„{x), further 

lim ( tUn + (1 — t)Vn) = tu+ (\ — t)v. 

Cn—^OO 

Since we started with an arbitrary sequence c„ oo, it follows that tu + 
(1 — t)v G Liminfc^oo hc{x). Now we can prove that hoo{x) is convex. Let 
u,v € hoo{x). Then 3 {un}n>i and {vn}n>i C Liminfc^oo hc(x) such that 
Un ^ u and —>■ v as n —>■ oo. We need to show that tu + {1 — t)v G hoo(x), 
for 0 < t < 1. Since tUn + (1 — G Liminfc^oo hc{x), the desired result is 
obtained by letting n ^ oo in tUn + (1 — t)vn- 

Finally, we show that hoc is upper-semicontinuous. Let lim Xn = x, lim 

n—¥oo n—¥oo 

Vn = y and yn G hoo{xn), V n > 1. We need to show that y G hoo{x). Since 
yn G haoixn), 3 Zn G Liminfc^oo hc{xn) such that \\zn - VnW < Since 
Zn G Liminfc^oo hc{xn), n > 1, it follows that there exist c„ such that for all 
c > c„, d{zn,hc{xn)) < In particular, 3 G hc„{xn) such that ||z„ — Mn|| 
< i. We choose the sequence {c„}„>i such that c„+i > c„ for each n > 1. We 
now have the following: lim = y, G hc„{xn) V n and lim Xn = x. It 

n—¥OQ n—¥oo 

follows directly from assumption (g 15) that y G hoo{x). □ 
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3 Stability and convergence of stochastic recur¬ 
sive inclusions 

We begin by providing a brief outline of our approach to prove the stabil¬ 
ity of a SRI under assumptions (Al) — (^5). First we divide the time line, 
[0, oo), approximately into intervals of length T. We shall explain later how we 
choose and fix T. Then we construct a linearly interpolated trajectory from the 
given stochastic recursive inclusion; the construction is explained in the next 
paragraph. A sequence of ‘rescaled’ trajectories of length T is constructed as 
follows: At the beginning of each T-length interval we observe the trajectory 
to see if it is outside the unit ball, if so we scale it back to the boundary of 
the unit ball. This scaling factor is then used to scale the ‘rest of the T-length 
trajectory’. 

To show that the iterates are bounded almost surely we need to show that the 
linearly interpolated trajectory does not ‘run off’ to infinity. To do so we assume 
that this is not true and show that there exists a subsequence of the rescaled 
T-length trajectories that has a solution to x{t) G hoc{x{t)) as a limit point in 
C'([0,T],R‘’*). We choose and fix T such that any solution to x{t) G hoa(x(t)) 
with an initial value inside the unit ball is close to the origin at the end of time 
T. In this paper we choose T = T(52 — ^i) + 1- We then argue that the linearly 
interpolated trajectory is forced to make arbitrarily large ‘jumps’ within time 
T. The Gronwall inequality is then used to show that this is not possible. 

Once we prove stability of the recursion we invoke Theorem 3.6 & Lemma 
3.8 from Benaim, Hofbauer and Sorin [7] to conclude that the limit set is a 
closed, connected, internally chain transitive and invariant set associated with 
x{t) G h^{x{t)). 


We construct the linearly interpolated trajectory x{t), for t G [0, oo), from 
the sequence {xn} as follows: Define t{Q) := 0, t{n) := a,{i). Let 

x{t{n)) := Xn and for t G {t{n),t{n + 1)), let 


x{t) 


/ t{n + l)-t \ 
\t{n -I- 1) - t{n)) 


x{t{n)) 


f t - tjn) \ 

\t{n -I- 1) - t{n)) 


x{t{n + 1)). 


We define a piecewise constant trajectory using the sequence {j/n}n>o as follows: 
y{t) := Un for t G [t{n), t{n + 1)), n > 0. 


We know that the DI given by x{t) G hoo{x{t)) has an attractor set A such 

that := sup||x|| < 1. Let us fix T := T{52 — i5i) -f 1, where T{S 2 — (5i) is 

x^A 

as defined in section [2Tl Then, ||x(t)|| < <52, for all t > T{52 — (5i), where 
{x{t) : t G [0, oo)} is a solution to x{t) G hoo{x{t)) with an initial value inside 
the unit ball around the origin. 


Define To := 0 and Tn := min{t{m) : ^(to) > Tn_i-f Tj, n > 1. Observe 
that there exists a subsequence {m(n)}n>o of {n} such that Tn = t(m(n)) 
V n > 0. We construct the rescaled trajectory, x(t), t > 0, as follows: Let 
t G [T„,Tn+i) for some n > 0, then x{t) := where r(n) = ||x(Tn)|| V 1. 

Also, let x(T“_^^) := lim x(t), t G [T„,Tn+i). The corresponding ‘rescaled y 




iterates’ are given by y{t) := and the rescaled martingale noise terms by 
Mfc+i := t(k) e [T„,T„+i), n > 0. 

Consider the recursion at hand, i.e., 

x(t(k + l)) = x(t(k)) + a(k)(y(t(k)) + Mk+i), 

such that t{k), t{k + 1) G Multiplying both sides by l/r(n) we get 

the rescaled recursion: 

x{t{k + l)) = x{t{k)) + a{k) (j){t{k)) + Mk+i^ ■ 

Since y{t{k)) € h(x{t{k))), it follows that y{t{k)) G /ir(n) {x{t{k))). It is worth 
noting that \\Mk+i\\'^\J^k < (l + ||:r(t(fc))|p). 
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3.1 Characterizing limits, in (^([O,T],M^), of the rescaled 
trajectories 

We do not provide proofs for the first three lemmas since they can be found 
in Borkar [TU] or Benaim, Hofbauer and Sorin [7]. The first two lemmas 
essentially state that the rescaled martingale noise converges almost surely. 

Lemma 1. sup i?||a;(t)|p < oo. 
te[o,T] 

Lemma 2. The rescaled sequence {Cn}ra>i, where 
convergent almost surely. 

The rest of the lemmas are needed to prove the stability theorem, Theorem[T] 
We begin by showing that the rescaled trajectories are bounded almost surely. 

Lemma 3. sup ||i(t)|| < oo a.s. 

ie[0,cso) 

As stated earlier we omit the proof of the above stated lemma and establish a 
couple of notations used later. Let A = {oj \ {Cn(w)}n>i converges}. Since 
n > 1, converges on A, there exists < oo, possibly sample path dependent, 
such that ||X]/=o + 0 -^m(n)+i+i|| < Myj, where is independent of n 

and k. Also, let sup||a;(t)|| < where := (1 + + (T + \)K) 

t>o 

is also a constant that is sample path dependent. 


Let a;”(t), t G [0,r] be the solution (upto time T) to x^{t) = y{Tn + 1 ), with 
the initial condition a;"(0) = a;(T„). Clearly, we have 

= x(T„) + f y{Tn + z)dz. (4) 

Jo 

The following two lemmas are inspired by ideas from Benaim, Hofbauer 
and Sorin [7j as well as Borkar m- In the lemma that follows we show that 
the limit sets of {x ^{-) | n > 0} and {x{Tn +-) | n > 0} coincide. We seek limits 
in C'([0,r],R‘^). 

Lemma 4. lim sup ||x”(t) — i;(t)|| = 0 a.s. 

n^oo 

Proof. Let t G [t{m{n) + k),t{in{n) + /c + 1)) and t{m{n) + fc + 1) < We 

first assume that 

t{m{n) + fc + 1) < Tn+i- We have the following: 


x{t) = 


t{m{n) + fc + 1 ) — t 
a(m{n) + fc) 


x{t{m{n)+k))-\- 


t — t{m{n) + fc) 
a{m{n) + fc) 


x{t{m{n)-\-k+l)). 


Substituting for x(t(m(n) + fc + 1)) in the above equation we get: 


x{t) = 


t{m{n) + k + 1) —t 
a(m(n) + fc) 


x{t{m{n) + fc)) 


t — t{m{n) + fc) 
a{m{n) + fc) 


(^x{t{m(n) + fc)) + a{m(n) + fc) (jj{t{m(n) + fc)) + Mm{n )+/=+i)) > 
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hence, 


x{t) = x{t(m{n) + k)) + {t - t(m{n) + k)) (y{t{m{n) + k)) + . 

Unfolding x{t{m{n) + k)) over k we get, 

k-l 

x{t) = x{Tn) + ^ a{m{n) + 1) (y{t{m{n) + 1)) + Mm(n)+/+i) + 

1=0 

{t - t{m{n) + k)) (y{t{m{n) + k)) + Mrnin)+k+ij ■ (5) 

Now, we consider i.e., 

a;”(i) = i(r„) + f y(Tn + z) dz. 

Jo 

Splitting the above integral, we get 

x"-{t)= x{Tn) + ^ / y{z)dz+ / y{z)dz. 

^_Q Jt{m{n)-\-l) Jt{m{n)-\-k) 

Thus, 


k-l 


X^{t) = x(Tn)+Y^ a{m(n)+l)y{t{m(n)+l))+{t — t(m{n) + k)) y(t(m(n)+fc)). 
;=o 

( 6 ) 


From ([S]) and it follows that 


\\x-{t)-x{t)\\ < 


k-l 

^ a{m{n) + 0M„(„)+i+i 
1=0 




and hence, 

Ijx (t) x(t) II ^ ||Cm(n)+fc Cm(n) I] II Cm(n)+fc+l Cm(n)+fc II ■ 

If t(m(n) + fc + 1) = Tn+i then in the proof we may replace x{t(m(n) + fc + 1)) 
with x(T~j^-^). The arguments remain the same. Since n > 1, converges 
almost surely, the desired result follows. □ 

The sets {x"'(t),< G [0,T] | n > 0} and {x(T„ + t),t & [0,T] | n > 0} can be 
viewed as subsets of (7([0,T],IR‘^). It can be shown that {x"(t),t € [0,T] | n > 
0} is equi-continuous and point-wise bounded. Thus from the Arzela-Ascoli 
theorem, {x'^{t),t € [0,T] | n > 0} is relatively compact. It follows from 
Lemma|3]that the set {x(T„ +t),t £ [0,T] | n > 0} is also relatively compact 
in C'([ 0 ,r],K‘^). 

Lemma 5. Let r{n) 'I' oo, then any limit point of {x(T„ + t)^t £ [0, T] : n > 0} 
is of the form x{t) = x(0) -I- /q y(s) ds, where y : [0,T] is a measurable 

function and y(t) £ hoa{x{t)), t £ [0,T]. 
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Proof. We define the notation [t] := max{t{k) \ t{k) < t}, t > 0. Let t G 
then y[t) G and ||y(<)|| < K {I + ||x([t])||) since is 

a Marchaud map {K is the constant associated with the point-wise boundedness 
property). It follows from Lemma [3] that sup ||y(t)|| < oo a.s. Using obser- 

iG[0,oo) 

vations made earlier, we can deduce that there exists a sub-sequence of N, say 
{Z} C {n}, such that x{Ti -|-t) —7> x{t) in C ([0,T],]R'^) and y{m{l)+-) —> y{-) 
weakly in L 2 ([0,T],. From Lemma 0] it follows that x^{-) —)• x{-) in 
C Letting r(Z) t 00 in 

x\t)=x\Q)+ f y{t{m{l) + z))dz, t G [0,r], 

Jo 

we get x{t) = a:(0) -I- j^y{z)dz for t G [0,T]. Since ||a:(r„)|| < 1 we have 

||x(0)|| < 1. 

Since y{Ti+ ■) ^ y{-) weakly in L 2 there exists {l{k)} C {/} such 

that 

1 ^ 

■J^'^yiTi{k)+ ■)^y(-) strongly in L 2 ([0,r],K‘^) . 

^ k=l 

Further, there exists {iV(m)} C {N} such that 

^ 7V(m) 

y('^m+ ■) ^y(-) a-e- on [o,r]. 


Let US fix to€ {t\ J2k=T'' yiTiik) + t) ^ y{t), t G [0,T]}, then 

^ N{m) 

yi'^m + *o) = y{to)- 

7V(m)—)-oo i\ [mj ^^ 

Since Zioo(a;(to)) is convex and compact (Proposition [T]) , to show that y{to) G 
hoo{x{to)) it is enough to prove that lim d{y{Tpk) +to):hoo{x (to))) = 0. If 

l{k)^oo 

not, 3 e > 0 and {n{k)} C {Z(fc)} such that d[y{Tn(k) + to),hoa{x{to))) > e. 
Since + Z:o)}fc>i is norm bounded, it follows that there is a convergent 

sub-sequence. For the sake of convenience we assume that lim y(T'n(fc) + Z^o) = 

k—^oo ' ' 

y, for some y G Since yiT^tk) + to) £ K(n(k)){x{[Tn(k) + Zo])) and lim 

k—¥oo 

xi[Tn{k) + ^o]) = xfto), it follows froiTL assumption (^5) that y G hoo{x{to)). 
This leads to a contradiction. □ 

Note that in the statement of Lemma [S] we can replace ‘r(n) f 00 ’ by ‘r(Z) f 
oo\ where {r(Z))} is a subsequence of {r{n)}. Specifically we can conclude that 
any limit point of {x{Tk +t),t£ [0, in C'([0, T], R'^), conditioned on 

r(fc) t OO; is of the form x{t) = a;(0) -I- fgy(z)dz, where y(t) G haa{x{t)) for 
t G [0,T]. It should be noted that j/(-) may be sample path dependent. The 
following is an immediate consequence of Lemma [5l 
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Corollary 1. 3 1 < i?o < oo such that V r{l) > Rq \\x{Ti +-) — a;(- )|| <63 — S 2 , 
where {Z} C N and x(-) is a solution (up to time T) of x{t) € hoo{x{t)) such 
that ||a;(0)|| < 1. The form of x {-) is as given by Lemma\^ 

Proof. Assume to the contrary that 3 r{l) f 00 such that x{Ti+-) is at least 
^3 — S 2 away from any solution to the DI. It follows from Lemma 0 that there 
exists a subsequence of {x{Ti +t),0<t<T : ZCN} guaranteed to converge, 
in C'([0,T],R'^), to a solution of x{f) G hoc{x{t)) such that ||x(0)|| < 1. This is 
a contradiction. □ 

It is worth noting that Rq may be sample path dependent. Since T = T{d 2 — 
<5i) + 1 we get ||x([T/ +T])|| < ^3 for all Ti such that ||lc(r/)||(= r(Z)) > Rq. 

3.2 Stability theorem 

We are now ready to prove the stability of a SRI given by @ under the 
assumptions (Al) — (A5). If sup r{n) < 00 , then the iterates are stable and 

n 

there is nothing to prove. If on the other hand sup r(n) = 00 , there exists 

n 

{Z} C {n} such that r(Z) f 00 . It follows from Lemma[S]that any limit point of 
{x{Ti +t),t G [0,T] : {Z} C {n}} is of the form x{t) = x(0) + Jq y(s) ds, where 
y{t) G hac{x{t)) for t G [0,T]. From assumption (A4), we have that ||a;('r)|| < 
62 . Since the time intervals are roughly T apart, for large values of r(n) we 
can conclude that ||i(r“^;^)|| < ^ 3 , where x{t), t G 

[Tn, Tn+l). 

Theorem 1 (Stability Theorem for DI). Under assumptions (AI) — (A5), 
sitp||x„|| < 00 a.s. 

n 

Proof. As explained earlier it is sufficient to consider the case when .sup r{n) = 

n 

00 . Let {Z} C {n} such that r(Z) t 00 . Recall that Ti = t{m(l)) and that 
\Ti + T]= max{t{k) \ t{k) < Ti + T}. 

We have ||x(T)|| < ^2 since x{t) is a solution, up to time T, to the DI given 
by x{t) G hoo{x{t)) and we have fixed T = T {62 — ^i) + 1. From Lemma [5]we 
conclude that there exists N such that all of the following happen: 

(i) m{l)>N ||x([7]+T])|| <^ 3 . 

(ii) n>N ^ a{n) < . 

(iii) n > m > N ||C„ - Cm|| < M^. 

(iv) m{l) > N => r{l) > Rq. 

In the above, Rq is defined in the statement of Corollary [T] and , M^i are 
explained in Lemma [31 


13 



Recall that we chose sup||a::|| = < ^2 < <^3 < ^4 < 1 in Section [521 Let 

xeA 

m(l) > N and t(m(Z + 1)) = t{m(l) + fc + 1) for some k > 0. Clearly from the 
manner in which the T„ sequence is defined, we have t{m(l) + fc) = [T; + T]. 
As defined earlier x(t), t € [r„,r„+i) and n > 0. 

Consider the equation 


^(L’i+i) = x{t{m{l) + k)) + a{m{l) + k) + k)) + Mrn{i)+k+-i^ ■ 

Taking norms on both sides we get, 

P(L’i+i)|| < ll^(^(TO(0+fc))ll + aim{l)+k)\\y{t{m{l)+k))\\ + a{m{l)+k)\\Mrn(i)+k+i\\- 
From the way we have chosen N we conclude that: 

\\y(t(m{l) + k))\\ < K (I + \\x(t(m{l) + k)\\) < K {I + Ki^) and that 

ll'^m(q+fc+l II IICm(q+fc+l Cm(/)+fcll — 

Thus we have that. 


P(Li+i)ll < \\x{t{m{l) + k))\\ + a{m{l) + k) {K{1 + K^) + M^). 
Finally we have that ||i(T';^j^)|| < ^4 and 


||x(T,+i)|| _ ||x(r,;i)|| 

ll^(L’0ll ll^(rz)|| 


(7) 


It follows from (O that ||x(T„ 4 _i)|| < ( 54 ||x(T„)|| if ||x(T„)|| > Rq. From 
Corollary |T] and the aforementioned we get that the trajectory falls at an expo¬ 
nential rate till it enters Let t < Ti, t G [rji,T„ 4 _i) and n -|- 1 < Z, be 

the last time that x{t) jumps from Biig{Q) to the outside of the ball. It follows 
that ||x(Tji+i)|| > ||a;(Ti)||. Since r{l) t 00 , x{t) would be forced to make larger 
and larger jumps within an interval of T-|-1. This leads to a contradiction since 
the maximum jump within any fixed time interval can be bounded using the 
Gronwall inequality. □ 


We now state one of the main theorems of this paper. 

Theorem 2. Under assumptions (AI) — (A5), almost surely, the sequence 
{xn}n>o generated by the stochastic recursive inclusion, given by is bounded 
and converges to a closed, connected, internally chain transitive and invariant 
set of x{t) G h(x(t)). 

Proof. The stability of the iterates is shown in Theorem [TJ The convergence 
can be proved under assumptions (Al) — (A3) and the stability of the iterates 
in exactly the same manner as in Theorem 3.6 & Lemma 3.8 of Benaim, 
Hofbauer and Sorin [7]. □ 

We have thus far shown that under assumptions (AI) — (A5) the SRI given 
by © is stable and converges to a closed, connected, internally chain transitive 
and invariant set. 
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3.3 Stability theorem under modified assumptions 

In {AA) we assumed that Liminfc^oohc{x) is nonempty for all x € In 
this section we shall develop a stability criterion for the case when we no longer 
make such an assumption. In other words, we work with a modified version of 
assumption {AA) that we call (>16). 

Modification of Assumption (A4) 

Recall the following SRI'. 

Xn+i =Xn + a{n) [Un + M„+i], for n > 0. (8) 

Since he is point-wise bounded for each c > 1, we have sup || 2 /|| < Rr(l-|-||a;||), 

yGheix) 

where x (see Proposition [T|). This implies that {yc}c>i, where pc G hc{x), 
has at least one convergent subsequence. It follows from the definition of upper- 
limit of a sequence of sets (see Sectionthat Limsupc^oohdx) is non-empty 
for every x G It is worth noting that Liminfc^oohc{x) C Limsupc^oohc{x) 
for every x G Another important point to consider is that the lower-limits of 
sequences of sets are harder to compute than their upper-limits, see Aubin [3] 
for more details. 

Recall that hc{x) = {y \ cy € h{cx)}, where x G and c > I. Clearly the 
upper-limit, Limsupc^oo hdx) = {y \ Urn div. kAxY) = 0} is nonempty for 

C—^-CXD 

every x G For A C cb(A) denotes the closure of the convex hull of A, 
i.e., the closure of the smallest convex set containing A. 

Define hooix) := cb( Limsupc^oo hc{x)). 

Below we state the modification of assumption (A4) that we call (A6). 


(A6) The differential inclusion xit) G hao{x(t)) has an attracting set A C 
i3i(0) and Bi{0) is a subset of some fundamental neighborhood of A. 


Note that in (A4), hoo(x) := Liminfe^oo hdx) while in (A6), hoo{x) := 
CO { Limsupe^ao hc{x)). In this section we shall work with this new definition 
of hoo- 

Proposition 2. hoo is a Marchaud map. 

Proof. From the definition of hoo it follows that hoo{x) is convex, compact for 
all X G and hoo is point-wise bounded. It is left to prove that hoo is an 
upper-semicontinuous map. 

Let Xn ^ X, pn ^ y and pn G hoo{xn), for all n > I. We need to show 
that y G hoo{x). We present a proof by contradiction. Since hooix) is convex 
and compact, y ^ hooix) implies that there exists a linear functional on 
say /, such that sup /(z) < a — e and /(y) > a + e, for some a G R 

Zehaa(x) 
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and e > 0. Since yn —>■ y, there exists > 0 such that for all n > N, 
fiVn) > a + |. In other words, hoo{x) n [/ > a + |] ^ for all n > N. 

We use the notation [/ > a] to denote the set {x | f{x) > a}. For the sake of 
convenience let us denote the set Limsupc^oohc{x) by A{x), where x S We 
claim that A{xn) H [/ > a + |] 7^ (/> for all n > N. We prove this claim later, for 
now we assume that the claim is true and proceed. Pick Zn € A{xn)ri[f > a + f] 
for each n > N. It can be shown that {zn}n>N is norm bounded and hence 
contains a convergent subsequence, {Zn(k)}k>i C {zn}n>N- Let lim Zn(k) = z. 

^ — — k^oo ' ' 

Since z„(fc) G Limsupc^^{hc{xn{k))), 3 c„(fe) G N such that \\wn(k) - Zn{k)\\ < 
where Wn(k) G hc^(k)i^n{k))- We choose the sequence {cn(k)}k>i such that 
c„(fc+i) > c„(fc) for each k>l. 

We have the following: Cn{k) t 00 , Xn(k) x, Wn{k) z and Wn(k) G 
for all fc > 1. It follows from assumption (A5) that z G hoo{x). 
Since Zn(k) z and f{zn(k)) > o + f for each A: > 1, we have that f{z) > q; + 
This contradicts the earlier conclusion that sup f(z) < a — e. 

zGhaaix) 


It remains to prove that A(xn) fl [/ > a + |] 7 ^ (ji for all n > If this were 
not true, then 3{m{k)}k>i C {n > A^} such that A{xm(k)) ^ [/ < Q; + |] for all 
k. It follows that 

hoo{xm(k)) = co(A(xm(fc))) C [/ < a + |] for each k > 1. Since y^^k) V, ^A^i 
such that for all n{k) > Ni, f{yn{k)) > a + x- ^ contradiction. □ 

We are now ready to state the second stability theorem for an SRI given by dS]) 
under a modified set of assumptions. We retain assumptions (Al) —(A3), replace 
(A4) by (A 6 ) and finally in (A5) we let hao{x) := co( Limsupc^ao hc{x)). We 
state the theorem under these updated set of assumptions. 

Theorem 3 (Stability Theorem for DI #2). Under assumptions (Al) — (A3), 
(A5) (with hoo(x) := cd{Limsupc^aohc{x))) and (A 6 ), almost surely the se¬ 
quence {xn}n>o generated by the stochastic recursive inclusion, given by m is 
bounded and converges to a closed, connected internally chain transitive invari¬ 
ant set of x(t) G h{x(t)). 

Proof. The statements of Lemmas [TJ-[5]hold true even when hca := cd ( Limsup, 
and (A5) is interpreted as explained earlier. The stability of the iterates can 
be proven in an identical manner to the proof of Theorem [TJ Next, we invoke 
Theorem 3.6 & Lemma 3.8 of BenaTm, Hofbauer and Sorin [7] to conclude 
that the iterates converge to a closed, connected, internally chain transitive and 
invariant set of x{t) G h(x{t)). □ 


Remark 1. Assumptions (A4) and (A6) required that x(t) G hao{x(t)) have an 
attractor set inside Ri(0) (the open unit ball). Further, it required Ri(0) to be in 
its fundamental neighborhood. Note that hoo{x) is defined as Liminfc^oo hc(x) 
when using (A4) and it is defined asW{ Limsupc^oo hc(x)) when using (A6). 
Consider the following generalization of (A4)/(A6). 

(A4)7(A6y: x{t) G hoc{x{t)) has an attractor set A such that A C Ba{0) and Ba{0) is 
a subset of its fundamental neighborhood, where 0 < a < 00 . 


hc{x)) 
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Note that a could be greater than 1, further since A is compact by definition, a is 
finite. A sufficient condition for {A'f)'/{AQ)' is when A is a globally attracting, 
Lyapunov stable set associated with x{t) G hoo{x{t)). In this case any compact 
set is a fundamental neighborhood of A. 

At the beginning of Section\^ we constructed the rescaled trajectory by pro¬ 
jecting onto the unit ball around the origin. In order to use {AA)'/{Ad))' we 
build the rescaled trajectory by projecting onto Ba{0) instead. We can modify 
the proofs such that the statements of Theorems [H and 0 remain true under 
assumptions (Al) — (A3), (A4)'/(A6)' and (A5). 


Remark 2. The advantage of using (A4)'/(A6)' is that one can conclude the 
stability of the iterates by merely possessing the knowledge that the associated 
DI of the infinity system has a global attractor set. Consider the following 
trivial example of a stochastic gradient descent algorithm with linear gradient 
function of the from —{Ax-\-b). The corresponding infinity system, x(t) = —Ax, 
is clearly “related” to the associated o.d.e. x{t) = —{Ax-\-b). Specifically, if 
there was a unique global minimizer then both the aforementioned o.d.e.’s have 
a global attractor which in turn implies the stability of the iterates as discussed 
before. This trivial example also illustrates a finer point that hao and h could 
be related, hence information about h could help us ascertain if (A4)'/(A6)' is 
satisfied. Whenever possible one could also construct Lyapunov functions to 
ascertain the same. While we did not consider Lyapunov-type conditions for 
stability, it would be interesting to extend the Lyapunov-type stability conditions 
developed for SRE’s by Andrieu, Priouret and Moulines to include 
SRI’s. 
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4 Extensions to the stability theorem of Borkar 
and Meyn 

We begin this section by listing the assumptions (See Section 2 of [13]) and 
statement of the Borkar-Meyn Theorem (See Section 2.1 of |13]i. The notations 
used are consistent with those of equation (Cl)- 

(BMl) (i) The function /i : —>■ is Lipschitz continuous, with Lipschitz con¬ 
stant L. There exists a function h^o : —>■ such that lim = 

c—>oo ^ 

hac,{x), for each x 

(ii) he —>■ hoc uniformly on compacts, as c —>■ oo. 

(iii) The o.d.e. x{t) = hoo{x{t)) has the origin as the unique globally 
asymptotically stable equilibrium. 

(BM2) {a(n)}„>o is a scalar sequence such that: a[n) > 0, ^ a(n) = oo and 

n>0 

^ a(n)^ < oo. Without loss of generality, we also let sup a{n) < 1. 

n>0 n 

(BM3) {Mn\n>i is a martingale difference sequence with respect to the filtration 
J-n := (7 (xo, Ml, ..., Mn), n > 0. Thus, E [Mn+i\En\ = 0 a.s., V n > 0. 
{Mn) is also square integrable with |p|< L (l -I- ||x„|p), for 

some constant L > 0. Without loss of generality, assume that the same 
constant, L, works for both {BMl){i) and {BM3). 

Theorem 4 (Borkar-Meyn Theorem). Suppose (BM1)-(BM3) hold. Then sup\\xn 

n 

oo almost surely. Further, the sequence {xn} converges almost surely to a (pos¬ 
sibly sample path dependent) compact connected internally chain transitive in¬ 
variant set of x{t) = h(x{t)). 

In what follows we illustrate a weakening of [BMl) — {BM3) stated above 
using Theorems [H & 131 Note that {BM2) is the standard step-size assumption 
while {BM3) is the assumption on the martingale difference noise; we endeavor 
to weaken (BMl). 

4.1 Superfluity of (BMl)(u) as a consequence of Theo¬ 
rem [2] 

In this section we discuss in brief how the Borkar-Meyn Theorem (Theorem|3|) 
can be proven under {BMl)(i), {Hi), {BM2) and {BM3). In other words, we 
show that {BMl){ii) is superfluous. In this direction we begin by showing the 
following: A recursion given by (HJ satisfies {BMl){i), {Hi), {BM2) and {BM3) 
(HD satisfies (AI) — (A5) of Section 12.21 The following implications are 
straightforward: {BMl){i), {Hi) (Al) & (A4); {BM2) ^ (A2); {BM3) ^ 
(A3). We now show {BMl){i),{Hi) ^ (AS). Given —>■ x, c„ t oo and 

hc„{xn) —?■ y we need to show y = hoo{x). We have the following: 

\\hcAxn) - hec{x)\\ < ||/ic„(a:„) - he,,{x)\\ -k \\he,,{x) - hoc,{x)\\. 

If h is Lipschitz with constant L then it can be shown that he {he '. x ^ h{cx) ^ 

X G M.‘^) is Lipschitz, for every c > I, with the same constant. Further, hc^{x) —>• 
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hao{x) as c„ t oo. Taking limits (c^ t oo) on both sides of the above equation 
gives lim hc„{xn) = hoo{x) as required. Since (Al) — (A5) are satisfied it 


Cntoo 


follows from Theorem [5] that a SRE satisfying {in), {BM2), {BM3) 

is stable and converges to a closed, connected, internally chain transitive and 
invariant set of x{t) = h{x{t)) (Theorem 2]) . 

We discuss in brief how we work around using {BMl){ii) in proving the 
Borkar-Meyn Theorem. The notations used herein are consistent with those 
found in Chapter 3 of Borkar [TU]. We list a few below for easy reference. 

1. 4 >n{- ,x) denotes the solution to x{t) S h.j.(^n){x{t)) with initial value x. 

2. 4>oo{‘ ,x) denotes the solution to x(t) G hoo(x(t)) with initial value x. 

3. x"(t), t G [0,T] denotes the solution to x^(t) = hr(n)(x(Tn + t)) with 
initial value a;"(0) = x{Tn). Then x^{t) = (j)n{t,x{Tn)), t G [0,T]. 

In proving the Borkar-Meyn Theorem as outlined in |13) {BMl){ii) is used to 
show that for large values of r{n), (j)n{t,x{Tn)) is ‘close’ to 4>oo{t, x{Tn)), t G 
[0, T]. In this paper we deviate from [T3] in the definition of x^{t), t G [0, T\, here 
x"(-) denotes the solution up to time T of x'^{t) = y(r„ + <) = /ir(„)(a;([T'„ + f])) 
with x'^{Q) = x{Tn), where [•] is defined in Lemma [S) In other words, we have 
the following: 



For t G [tn,tn+i), y{t) is a constant and equals y{tn). We get the following: 


k-l 


x'^{t) = x(r„)+^ a{m{n)-i-l)hr(n) {x{[t{m{n) + l)]))-{-{t - t{m{n) + k)) h^i^n) {x{[t{m{n) + k)])). 


The proof now proceeds along the lines of Section [3^ ie., Lemmas [T] - [5] and 
Theorem [TJ we essentially show the following: If r{n) f oo then the T-length 
trajectories given by {x"(- )}ra>o have (l)oo{x,t), t G [0,T], as the limit point in 
C'([0,T],R'^), where x G Bi{0). This is proven in Lemmas 2] and 0 the proofs 
of which do not require {BMl){ii). 

4.2 Further weakening of (BMl) as a consequence of The¬ 
orem [3] 

In this section we use the second stability theorem (Theorem 2]) to answer 
the following question: If lim hc{x) does not exist for all x S then what are 


the sufficient conditions for the stability and convergence of the algorithm? 

Taking our cue from assumption (46), we replace {BMl) with the following 
assumption, call it {BMA). 

(BM4)(i) The function h : —>■ is Lipschitz continuous, with Lipschitz constant 


L. Define the set-valued map, hoo{x) := m {Limsupc^oc{hc{x)}), where 
x gW^ ■ 

Note that Limsupc^oo{hc{x)} = {y \ lim \\hc{x) — y\\ = 0}. 
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(BM4)(ii) xit) G hoo(x(t)) has an attracting set, A, with i?i(0) as a subset of its 
fundamental neighborhood. This attracting set is such that A C _Bi(0). 

Observe that Limsupc^oo{hc{x)} = lim hc(x) when lim hc(x) exists. Recall 

c—>-oo c—>^00 

the definition of Limsup, the upper-limit of a sequence of sets, from Section [2.II 
It can be shown that if a recursion given by (ED satisfies assumptions 
and {BMl){Hi) then it also satisfies {BMA){i), {ii). Assumption (BMA) unifies 
the two possible cases: when the limit of lie, as c —>■ 00 , exists for each a: S 
and when it does not. 

We claim that a recursion given by ED, satisfying assumptions {BM2), {BM3) 
and {BM4) will also satisfy (Al) — (A3), (A6) and (A5) (see section I5T5|) . From 
Theorem [ 3 ] it follows that the iterates are stable and converge to a closed, con¬ 
nected, internally chain transitive and invariant set of x(t) = h(x{t)). The 
following generalization of the Borkar-Meyn Theorem is a direct consequence of 
Theorem [31 

Corollary 2 (Generalized Borkar-Meyn Theorem). Under assumptions {BM2), 
{BM3) and [BMA), almost surely the sequence {Xn}n>o generated by the stochas¬ 
tic recursive equation m, is bounded and converges to a closed, connected, in¬ 
ternally chain transitive and invariant set of x(t) = h{x(t)). 

Proof. Assumptions (Al) — (A3) and (A6) follow directly from {BM2), (BMA) 
and {BMA). We show that (A5) is also satisfied. Let c„ t 00 , x, pn — >■ 

y and ?/„ G hc„(x„) (here pn = hc„(xn)), V n > 1. It can be shown that 
||^c„(a^rt) — hc„(x)\\ < L\\xn — a:||. Hence we get that hc„{x) —> y. In other 
words, lim \\hc{x) — y\\ = 0. Hence we have y G haoix). The claim now follows 

C—^■CXD 

from Theorem [ 3 ] □ 

5 Applications: The problem of approximate drifts 
&; stochastic gradient descent 

5.1 The problem of approximate drifts 

Let us recall the standard SRE: 


Xn-\-i — Xji -\- a^n) {h{xA) T ^n-\-i) , (9) 

where h : —>• is Lipschitz continuous, {a(n)}„>o is the step-size sequence 

and {Mn}n>i is the noise sequence. 

The function h is colloquially referred to as the drift. In many applications 
the drift function cannot be calculated accurately. This is referred to as the 
approximate drift problem. For more details the reader is referred to Chapter 
5.3 of Borkar nnj. Suppose the room for error is at most e(> 0) then such an 
algorithm can be characterized by the following stochastic recursive inclusion: 

Xji+i — Xn -\- a{rdj {tjn -\- A/yi-i-i), (19) 
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where 2/„ S h{xn) + B^{0) is an estimate of h{xn) and i?e(0) is the closed 
ball of radius e around the origin. We define a new set-valued map called the 
approximate drift by H{x) := h{x) + B^{0) for each x G In the following 
discussion we assume that e > 0. When e = 0, the approximate drift algorithm 
described by m is really the SRE given by ®- 

In this section we show the following: If (|9]) satisfies {BM2), {BM3) and 
{BMA) then the corresponding approximate drift version given by (flUl) satisfies 
(yll) — (A5). For details on {BM2) and {BM3) see Section H?T1 see Section 
for {BMA). We then invoke Theorem |3] to conclude that the iterates converge 
to a closed, connected, internally chain transitive and invariant set associated 
with x{t) G h(x(t)) + Bg(0)(= H{x{t))). 

For the remainder of this section it is assumed that (|Hp satisfies 

{BM2),{BM3) and {BMA). 

Proposition 3. H{x) = h{x) A- B^{0) is a Marchaud map. Further, recursion 
hi (A) satisfies (^1), (^2) and (A3). 

Proof. Since i3e(0) is convex and compact, it follows that H{x) is convex and 
compact for each x € Fix x G and y G H{x), then ||?/|| < ||/i(a:)|| + e 
and ||y|| < ||li(0)|| -I- LUx —0|| -I- e since h is Lipschitz continuous with Lipschitz 
constant L. If we set K := (||/i(0)|| -I- e) V L, then we get || 2 /|| < K {1-\- |la:||). 
This shows that H is point-wise bounded. To show the upper semi-continuity 
of H assume lim Xn = x, lim = y and yn G H{xn) for each n > 1. For all 

n—foo n—¥oo 

n > 1, yn = h{xn) -\- Zn for some G i?e(0). Further, h{xn) h{x) as Xn —>■ x. 
Since both {?/„}„>i and {h{xn)}n>i are convergent sequences, {zn}n>i is also 
convergent. Let z := lim z is such that z G i?e(0) since i3e(0) is compact. 

n—^oo 

Taking limits on both sides of yn = h{xn) + Zn, we get y = h{x) + z. Thus 
y G H{x). 

Since Gnu is assumed to satisfy {BM2) and {BM3) it trivially follows that it 
satisfies {A2) and (A3). □ 

Before showing that m satisfies (A4), we construct the following family of 
set-valued maps: 

iJ,(x) :=|^ + ^ I 2/ G :B,(0)|, (11) 

where c > 1 and a; G In other words, Hc{x) = hc{x) + B,:/^{0) for each 
X G M.‘^. 

Proposition 4. hl0\} satisfies (A6). 

Proof. To prove this it is enough to show that iLoo(a;) = hoo(a;), where iLoo (a:) := 
Limsupc^aoHc{x) and hoo{x) := Limsupc^oohc{x). Since x{f) G hoo{x{t)) 
satisfies {BMA){ii) it trivially follows that (A6) is satisfied by (1101) . Note that 
{BMA){ii) and (A6) essentially say the same thing. 
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First we show hoo{x) C Hoo{x) for every x S Let y £ hoo{x), 3c„ t c>o 
such that hc^ —>■ j/ as c„ f oo- Since hc^(x) £ Hc„{x) it follows from the 
definition of Limsup that y £ Hca(x). To show Hoo(x) C hoo{x) we start by 
assuming the negation i.e., for some a; £ 3?/ £ Hoq(x) such that y ^ hoc{x). 

Let c„ t oo and yn £ Hc„ixn) such that lim ?/„ = y. Since \\yn-hc„{xn)\\ < 

c^'fca " 

we have lim hc„{xn) = y- We have the following; 

Cn'^OO 


\\y-hcSx)\\ < \\y - hcAxn)\\ + \\hcSxu) - hc^{x)\\. 

Taking limits on both sides we get that ||y — hc„{x)\\ —> 0 i.e., y £ hoo{x). This 
is a contradiction. □ 


Proposition 5. UfH) satisfies (A5). 

Proof. Given Cn t oo, Xn ^ x, yn ^ y and yn £ Hc„ixn) Vn, we need to show 
that y £ Hcoix). As in the proof of Proposition 0] we have lim hc^ixn) = y. 

Since ||hc„(a;„) —hc„(x)|| < L||x„—x|| we have that lim ||hc„(x„) —hc„(x)|| = 0 

and lim hc„{x) = y. In other words, y £ /loo(x). In Proposition 0] we have 

Cntoo 

shown that hoa = Hoc therefore y £ iLoo(x). □ 


Corollary 3. If a SRE, given by satisfies {BM2), {BM3) and {BM4)(i), (ii) 
then the corresponding approximate drift version given by m is stable almost 
surely. In addition, it converges to a closed, connected, invariant and internally 
chain transitive set of x{t) £ H(x{t)), where iL(x) = h{x) + i?e(0). 

Proof. In Propositions [31 U] and [3] we have shown that satisfies (Al) — 
(A3), (A5), (A6); the statement now follows directly from Theorem [31 □ 


Remark 3. In the context o/(0), we have that h is Lipschitz and he '. x ^ h{cx) ^ 
Supposing lim hfix) exists for every x £ K'^ (see {BMl){i) in Section[^ then 

lim hc{x) = Limsupc^ao{hc{x)}. Further, Limsupc^oc{hc{x)} is non-empty 

c—>-oo 

for every x £ (since hc{x) < K{\ + ||x||), c>l), even if lim hfix) does not 

c—>-oo 

exist for some x £ R'^. Hence the analysis of the approximate drift problem in 
this section is all encompassing. The aforementioned is also the reason why in 
Sectionwe define hoo{x) := cd{Limsupc^oo{hc{x)}). It may be noted that 
we use Limsupc^ao{hc{x)} instead of Limsupc^aohdx) since Limsup acts on 
sets and h (in this context) is a function that is not set-valued. Finally, in 
Corollary 0 if we let e = 0 then we may derive Corollary [B 
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5.2 Stochastic gradient descent 

Stochastic gradient descent is a gradient descent optimization technique to 
find the minimum set of a (continuously) differentiable function. Suppose we 
want to find the minimum of F : —>■ R for which we can run the following 
SHE: 

Xn+l = Xn - a{n)[S/F{Xn) + Mn+i], (12) 

where VF : ^ is upper-semicontinuous and ||VF(x)|| < K{1 + ||a:||) 

\/x € R'^ (point-wise bounded). {a(n)}„>o is the given step size sequence and 
is the martingale difference noise sequence. If the assumptions of 
Bena’im, Hofbauer and Sorin [7] are satisfied by m then the iterates converge 
to a closed, connected, internally chain transitive and invariant set of x{t) = 
—VF{x{t)) which is also the minimum set of F. In this section we shall not 
distinguish between the asymptotic attracting set of x{t) = —\/F{x{t)) and the 
minimum set of F. 

As explained in Section[TJ while implementing (1121) one can only hope to calcu¬ 
late an approximate value of the gradient at each step. However, one has control 
over the “approximation error”. This is typical when gradient estimators with 
fixed perturbation parameters are used, it could also be a consequence of the 
inherent computational capability of the computer used to run the algorithm. 
In reality one is running the following SRF. 

Xn+l = Xn + a{n)[yn + Mn+l], (13) 

where Un S — VF(x„) -I- ^^(O) and e > 0 is the “approximation error”. The 
following questions are natural: 

1. Are the iterates stable? 

2. If so, where do they converge? 

Define the following set valued map, H : x ^ —VF{x) -\- 5^(0). As in (fTTl) 
we define F[c{x) := and F[oo(x) := Limsupc^ocHc{x) = 

Limsupc^oo I I Recall the definition of Limsup from Section 12.11 

Proposition 6. USD satisfies (Al) i.e., F[ is a marchaud map. 

Proof. Given Xn ^ x, pn ^ y and pn € H(xn) Vn, we need to show that 
y € F[(x). For each n we have = —'^F{xn) + Zn, where Zn S i3e(0). Since 
VF is point-wise bounded, it follows that {—VF(a:„)} is a bounded sequence. 
Let {n{m)} C N such that VF{xn(m)) VF(a:), yn(m) V- The subsequence 
Zn{m) z: for some z G BfiO) i.e., 

{-VF{Xn{m)) + Znim)) ^ i-VF{x) + z) G H{x). 

□ 

If in addition to (AI), equation (fT!Tll also satisfies (A2), (A3), (A5) and (A6) 
then it follows from Theorem [3] that the iterates are stable and converge to 
a closed, connected, internally chain transitive and invariant set of x{t) G 
(-VF(x(<))+:B,(0)). 
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Suppose F has the quadratic form Ax+Bx+c, where A is a positive 
definite matrix, B is some matrix and c is some vector. Then it can 
be shown that (^1), (A2), (A3), (A5) and (A6) are satisfied by <[ 1 and 
the iterates are stable and converge to a closed, connected, internally 
chain transitive and invariant set of x(t) G —{Ax{t) + B) + B„{0). If the 
comments in Remark[I\ are incorporated i.e., we use (A6)' instead of 
(A6) then matrix A need not be positive definite anymore. 

For the purpose of this discussion assume that VF is Lipschitz continu¬ 
ous. The graph of a set-valued map H ■. ^ {subsets of is given by 

Graph{H) = {{x,y) \ x G y G H{x)}. It is easy to see that Graph{—VF -1- 
Be{0j) C N'^'^ {Graph{—\7F)). Let us also assume that A is the global attractor 
(minimum set of F) of x{t) = —VF{x{t)) then every compact subset of is 
its fundamental neighborhood. It follows from the stability of the iterates that 
they will remain within a compact subset, say U, that may be sample path 
dependent. It follows from Theorem 2.1 of Bena'im, Hofbauer and Sorin [5] that 
for all 5 > 0 there exists e > 0 such that C N^{A) is the attractor set of 
xit) G —VF{x(t)) + i3e(0). Further, the fundamental neighborhood of A^ is L( 
itself. In other words, suppose we want to ensure convergence of the iterates 
to a ^ — neighborhood of the minimum set A then the “approximation error” 
should be at most e (e is dependent on J). 

6 Final discussion on the generality of our frame¬ 
work 

As explained in Section [31 we run a projective scheme to show stability. In 
other words, time is divided into intervals of length T; the iterates are checked 
at the beginning of each time interval to see if they are outside the unit ball; all 
the iterates corresponding to [r„,T„+i) are scaled by r(n) = ||a;(T„)|| V 1 i.e., 
the iterates are projected onto the unit ball around the origin. For t{m{n)) = 
Tn < t{m(n) -I- fc) < r„+i we have the following re-scaled iterate: 

x{t{m{n)+k)) x{t{m{n))) -x + j)) , 

r(n) r(n) [ r(n) r(n) 

In the above, ■ Since we have to worry 

about r(n) running off to infinity it is natural to define hoo{x) to include all 
accumulation points of {hc{x) | c > 1, c ^ oo}. This is precisely what the 
Limsup function (see Section Ol) allows us to do. In Lemma |S] it was shown 
that the scaled iterates track a solution to x{t) G hao{x{t)) provided the original 
iterates are unstable i.e., sup r(n) = oo. Assumptions (A4)/(A6) were never 

n 

used up to this point. At this stage it seems natural to impose restrictions on 
x{t) G hoc{x{t)) to elicit the stability of the original iterates. 

As explained in Section [331 Tzmsupc-s-oohc(a:) is non-empty for every x gR‘^ 
since h is point-wise bounded. Further, hoc = cd (Limsupc^oohc) is shown to 
be Marchaud and the DI x{t) G hoc{x{t)) has at least one solution. Assumption 
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(A6) is the restriction referred to in the previous paragraph that is imposed to 
elicit the stability of the original iterates. On a related note, if Liminfc^oohc 
were non-empty, then we define hoo = Liminfc^oohc and check if (v44) is sat¬ 
isfied. 

If the DI ±{t) G hoc{x{t)) has global attractor inside i?i(0), then this is a 
sufficient condition for (v46) to hold, it then follows from Theorem [3] that the 
original iterates are stable and converge to a closed connected internally chain 
transitive set associated with x{t) € h{x{t)). More generally, in lieu of Remark[T] 
it is sufficient that the DI has some global attractor, not necessarily inside the 
unit ball, since (A6)' will then hold. This in turn implies stability. 

In case of the original Borkar-Meyn assumptions, {BM\){i),(ii) (see Sec¬ 
tion 11 needed to be checked even before we could define h^o while in our case 
we do not need any extra assumptions to define hao- As explained before, con¬ 
structing a global Lyapunov function for hoo is one of many sufficient conditions 
that guarantee {A4)'/{A&)'. In case of Lyapunov-type conditions for stability, 
additional properties of the constructed global Lyapunov function need to be 
verified before we get stability, see [1] for more details. However, to the best of 
our knowledge, there are no Lyapunov-type conditions that guarantee stability 
of stochastic approximation algorithms with set-valued mean fields {SRI), the 
class of algorithms dealt with in this paper. Hence our assumptions are general 
and relatively easy to verify. 


7 Conclusions 

An extension was presented to the theorem of Borkar and Meyn to include 
approximation algorithms with set-valued mean fields. Two different sets of 
sufficient conditions were presented that guarantee the ‘stability and conver¬ 
gence’ of stochastic recursive inclusions. As a consequence of Theorems [5] & 
[21 the original Borkar-Meyn theorem is shown to hold under weaker require¬ 
ments. Further, as a consequence of Theorem |21 we obtained a solution to 
the “approximate drift” problem. Prior to this paper, there was no proof of 
stability of stochastic gradient descent algorithms that use constant-error gra¬ 
dient estimators. Hence we could only conclude that the iterates converge to a 
small neighborhood, say N, of the minimum set with very high probability. In 
Section (5.21 we used our framework to show the stability of the aforementioned 
algorithm which in turn allowed us to conclude an almost sure convergence to 
N. 

An important future direction would be to extend these results to the case 
when the set-valued drift is governed by a Markov process in addition to the 
iterate sequence. For the case of stochastic approximations, such a situation 
has been considered in [ cni, Chapter 6], where the Markov ‘noise’ is tackled 
using the ‘natural timescale averaging’ properties of stochastic approximation. 
Finally, it would be interesting to develop Lyapunov-type assumptions for sta¬ 
bility of stochastic algorithms with set-valued mean fields. 
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